What Clusters Are Generated by Normal Mixtures?

نویسنده

  • Christian Hennig
چکیده

Model based cluster analysis is often carried out by estimation of the parameters of a normal mixture. But mixture components do not necessarily reeect the idea of a \cluster". I discuss how to formalize the concept of \clusters" w.r.t. probability distributions on the real line by means of xed point clusters, i.e., sets that do not contain any outlier and with respect to which the rest of the real line consists of outliers. The concept is applied to some normal mixtures. 1 Theoretical clusters of distributions There are lots of diierent goals of cluster analysis (CA) and therefore there are lots of diierent meanings and formal deenitions of the term \cluster". Most of them have in common that a cluster should be a group of entities which is \homogeneous" in some manner, i.e., the entities are similar to each other, and which is in some manner \separated" from the rest. In CA based on probability models, it is assumed that the entities (usually from IR p ; here I consider only IR) are generated by some probability distribution. The clusters of the data are determined by using estimators of the features of the distribution (i.e., parameters or regions of high density). That is, the found clusters of data can be viewed as estimators of certain properties of the distribution as well, which I call \theoretical clusters". In order to explain the very meaning of the data clusters, it is necessary to think about the theoretical clusters generated by the considered model. Often a CA is based on a normal mixture model of the form (1) where N(b; 2) denotes the normal distribution with mean b and variance 2 , the (b i ; 2 i); i = 1; : : : ; s; being pairwise distinct. Most of the following considerations apply also to normal xed partition models. See Bock (1996) for literature concerning the use of these kinds of models in CA. How to deene the theoretical clusters of such normal mixtures? To my knowledge there are three approaches up to now used in the literature more or less implicitly (see Bock (1996) for references):

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pattern Clustering by Multivariate Mixture Analysis.

Cluster analysis is reformulated as a problem of estimating the para- meters of a mixture of multivariate distributions. The maximum-likelihood theory and numerical solution techniques are developed for a fairly general class of distributions. The theory is applied to mixtures of multivariate nor- mals (NORMIX) and mixtures of multivariate Bernoulli distributions (Latent Classes). The feasibili...

متن کامل

Identifying Mixtures of Mixtures Using Bayesian Estimation

The use of a finite mixture of normal distributions in model-based clustering allows us to capture non-Gaussian data clusters. However, identifying the clusters from the normal components is challenging and in general either achieved by imposing constraints on the model or by using post-processing procedures. Within the Bayesian framework, we propose a different approach based on sparse finite ...

متن کامل

Optimality Theoretic Account of Acquisition of Consonant Clusters of English Syllables by Persian EFL Learners*

This study accounts for the acquisition of the consonant clusters of English syllable structures both in onset and coda positions by Persian EFL learners. Persian syllable structure is "CV(CC)", composed of one consonant at the initial position and two optional consonants at the final position; whereas English syllable structure is "(CCC)V(CCCC)". Therefore, Persian EFL learners need to resolve...

متن کامل

Breakdown Points for Maximum Likelihood Estimators of Location–scale Mixtures by Christian Hennig

ML-estimation based on mixtures of Normal distributions is a widely used tool for cluster analysis. However, a single outlier can make the parameter estimation of at least one of the mixture components break down. Among others, the estimation of mixtures of t-distributions by McLachlan and Peel [Finite Mixture Models (2000) Wiley, New York] and the addition of a further mixture component accoun...

متن کامل

Breakdown Points for Maximum Likelihood Estimators of Location–scale Mixtures

ML-estimation based on mixtures of Normal distributions is a widely used tool for cluster analysis. However, a single outlier can make the parameter estimation of at least one of the mixture components break down. Among others, the estimation of mixtures of t-distributions by McLachlan and Peel [Finite Mixture Models (2000) Wiley, New York] and the addition of a further mixture component accoun...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000